Overview

Dataset statistics

Number of variables11
Number of observations699
Missing cells0
Missing cells (%)0.0%
Duplicate rows8
Duplicate rows (%)1.1%
Total size in memory60.2 KiB
Average record size in memory88.2 B

Variable types

NUM9
CAT2

Reproduction

Analysis started2020-07-01 08:37:52.478772
Analysis finished2020-07-01 08:39:11.829779
Duration1 minute and 19.35 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

Dataset has 8 (1.1%) duplicate rows Duplicates
3 is highly correlated with 2High correlation
2 is highly correlated with 3High correlation

Variables

0
Real number (ℝ≥0)

Distinct count645
Unique (%)92.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1071704.0987124464
Minimum61634
Maximum13454352
Zeros0
Zeros (%)0.0%
Memory size5.5 KiB

Quantile statistics

Minimum61634
5-th percentile411453
Q1870688.5
median1171710
Q31238298
95-th percentile1333890.8
Maximum13454352
Range13392718
Interquartile range (IQR)367609.5

Descriptive statistics

Standard deviation617095.7298
Coefficient of variation (CV)0.5758079404
Kurtosis257.7171591
Mean1071704.099
Median Absolute Deviation (MAD)104381
Skewness13.67532594
Sum749121165
Variance3.808071398e+11
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
118240460.9%
 
127609150.7%
 
119864130.4%
 
46690620.3%
 
111611620.3%
 
107093520.3%
 
38510320.3%
 
129343920.3%
 
124060320.3%
 
127779220.3%
 
Other values (635)67196.0%
 
ValueCountFrequency (%) 
6163410.1%
 
6337510.1%
 
7638910.1%
 
9571910.1%
 
12805910.1%
 
ValueCountFrequency (%) 
1345435210.1%
 
823370410.1%
 
137192010.1%
 
137102610.1%
 
136982110.1%
 

1
Real number (ℝ≥0)

Distinct count10
Unique (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.417739628040057
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Memory size5.5 KiB

Quantile statistics

Minimum1
5-th percentile1
Q12
median4
Q36
95-th percentile10
Maximum10
Range9
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.815740659
Coefficient of variation (CV)0.6373713473
Kurtosis-0.6237154123
Mean4.417739628
Median Absolute Deviation (MAD)2
Skewness0.5928585327
Sum3088
Variance7.928395456
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
114520.7%
 
513018.6%
 
310815.5%
 
48011.4%
 
10699.9%
 
2507.2%
 
8466.6%
 
6344.9%
 
7233.3%
 
9142.0%
 
ValueCountFrequency (%) 
114520.7%
 
2507.2%
 
310815.5%
 
48011.4%
 
513018.6%
 
ValueCountFrequency (%) 
10699.9%
 
9142.0%
 
8466.6%
 
7233.3%
 
6344.9%
 

2
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count10
Unique (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.13447782546495
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Memory size5.5 KiB

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q35
95-th percentile10
Maximum10
Range9
Interquartile range (IQR)4

Descriptive statistics

Standard deviation3.05145911
Coefficient of variation (CV)0.9735143395
Kurtosis0.09880288537
Mean3.134477825
Median Absolute Deviation (MAD)0
Skewness1.233136558
Sum2191
Variance9.3114027
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
138454.9%
 
10679.6%
 
3527.4%
 
2456.4%
 
4405.7%
 
5304.3%
 
8294.1%
 
6273.9%
 
7192.7%
 
960.9%
 
ValueCountFrequency (%) 
138454.9%
 
2456.4%
 
3527.4%
 
4405.7%
 
5304.3%
 
ValueCountFrequency (%) 
10679.6%
 
960.9%
 
8294.1%
 
7192.7%
 
6273.9%
 

3
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count10
Unique (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.207439198855508
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Memory size5.5 KiB

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q35
95-th percentile10
Maximum10
Range9
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.971912767
Coefficient of variation (CV)0.9265686995
Kurtosis0.007010980047
Mean3.207439199
Median Absolute Deviation (MAD)0
Skewness1.161859179
Sum2242
Variance8.832265496
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
135350.5%
 
2598.4%
 
10588.3%
 
3568.0%
 
4446.3%
 
5344.9%
 
7304.3%
 
6304.3%
 
8284.0%
 
971.0%
 
ValueCountFrequency (%) 
135350.5%
 
2598.4%
 
3568.0%
 
4446.3%
 
5344.9%
 
ValueCountFrequency (%) 
10588.3%
 
971.0%
 
8284.0%
 
7304.3%
 
6304.3%
 

4
Real number (ℝ≥0)

Distinct count10
Unique (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.8068669527896994
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Memory size5.5 KiB

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q34
95-th percentile10
Maximum10
Range9
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.855379239
Coefficient of variation (CV)1.017283429
Kurtosis0.9879470695
Mean2.806866953
Median Absolute Deviation (MAD)0
Skewness1.524468091
Sum1962
Variance8.1531906
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
140758.2%
 
3588.3%
 
2588.3%
 
10557.9%
 
4334.7%
 
8253.6%
 
5233.3%
 
6223.1%
 
7131.9%
 
950.7%
 
ValueCountFrequency (%) 
140758.2%
 
2588.3%
 
3588.3%
 
4334.7%
 
5233.3%
 
ValueCountFrequency (%) 
10557.9%
 
950.7%
 
8253.6%
 
7131.9%
 
6223.1%
 

5
Real number (ℝ≥0)

Distinct count10
Unique (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.216022889842632
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Memory size5.5 KiB

Quantile statistics

Minimum1
5-th percentile1
Q12
median2
Q34
95-th percentile8
Maximum10
Range9
Interquartile range (IQR)2

Descriptive statistics

Standard deviation2.214299887
Coefficient of variation (CV)0.6885211836
Kurtosis2.169066423
Mean3.21602289
Median Absolute Deviation (MAD)0
Skewness1.712171802
Sum2248
Variance4.903123988
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
238655.2%
 
37210.3%
 
4486.9%
 
1476.7%
 
6415.9%
 
5395.6%
 
10314.4%
 
8213.0%
 
7121.7%
 
920.3%
 
ValueCountFrequency (%) 
1476.7%
 
238655.2%
 
37210.3%
 
4486.9%
 
5395.6%
 
ValueCountFrequency (%) 
10314.4%
 
920.3%
 
8213.0%
 
7121.7%
 
6415.9%
 

6
Categorical

Distinct count11
Unique (%)1.6%
Missing0
Missing (%)0.0%
Memory size5.5 KiB
1
402
10
132
5
 
30
2
 
30
3
 
28
Other values (6)
 
77
ValueCountFrequency (%) 
140257.5%
 
1013218.9%
 
5304.3%
 
2304.3%
 
3284.0%
 
8213.0%
 
4192.7%
 
?162.3%
 
991.3%
 
781.1%
 

Length

Max length2
Median length1
Mean length1.188841202
Min length1

7
Real number (ℝ≥0)

Distinct count10
Unique (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.4377682403433476
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Memory size5.5 KiB

Quantile statistics

Minimum1
5-th percentile1
Q12
median3
Q35
95-th percentile8
Maximum10
Range9
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.438364252
Coefficient of variation (CV)0.7092869798
Kurtosis0.1846213115
Mean3.43776824
Median Absolute Deviation (MAD)1
Skewness1.099969082
Sum2403
Variance5.945620227
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
216623.7%
 
316523.6%
 
115221.7%
 
77310.4%
 
4405.7%
 
5344.9%
 
8284.0%
 
10202.9%
 
9111.6%
 
6101.4%
 
ValueCountFrequency (%) 
115221.7%
 
216623.7%
 
316523.6%
 
4405.7%
 
5344.9%
 
ValueCountFrequency (%) 
10202.9%
 
9111.6%
 
8284.0%
 
77310.4%
 
6101.4%
 

8
Real number (ℝ≥0)

Distinct count10
Unique (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.866952789699571
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Memory size5.5 KiB

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q34
95-th percentile10
Maximum10
Range9
Interquartile range (IQR)3

Descriptive statistics

Standard deviation3.053633894
Coefficient of variation (CV)1.065114816
Kurtosis0.4742686755
Mean2.86695279
Median Absolute Deviation (MAD)0
Skewness1.422261257
Sum2004
Variance9.324679956
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
144363.4%
 
10618.7%
 
3446.3%
 
2365.2%
 
8243.4%
 
6223.1%
 
5192.7%
 
4182.6%
 
9162.3%
 
7162.3%
 
ValueCountFrequency (%) 
144363.4%
 
2365.2%
 
3446.3%
 
4182.6%
 
5192.7%
 
ValueCountFrequency (%) 
10618.7%
 
9162.3%
 
8243.4%
 
7162.3%
 
6223.1%
 

9
Real number (ℝ≥0)

Distinct count9
Unique (%)1.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.5894134477825466
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Memory size5.5 KiB

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q31
95-th percentile5
Maximum10
Range9
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.715077943
Coefficient of variation (CV)1.07906344
Kurtosis12.65787807
Mean1.589413448
Median Absolute Deviation (MAD)0
Skewness3.560657844
Sum1111
Variance2.941492349
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
157982.8%
 
2355.0%
 
3334.7%
 
10142.0%
 
4121.7%
 
791.3%
 
881.1%
 
560.9%
 
630.4%
 
ValueCountFrequency (%) 
157982.8%
 
2355.0%
 
3334.7%
 
4121.7%
 
560.9%
 
ValueCountFrequency (%) 
10142.0%
 
881.1%
 
791.3%
 
630.4%
 
560.9%
 

10
Categorical

Distinct count2
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size5.5 KiB
2
458
4
241
ValueCountFrequency (%) 
245865.5%
 
424134.5%
 

Length

Max length1
Median length1
Mean length1
Min length1

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

Sample

First rows

012345678910
010000255111213112
1100294554457103212
210154253111223112
310162776881343712
410170234113213112
510171228101087109714
6101809911112103112
710185612121213112
810330782111211152
910330784211212112

Last rows

012345678910
6896545461111211182
6906545461113211112
691695091510105454414
6927140393111211112
6937632353111212122
6947767153111321112
6958417692111211112
6968888205101037381024
69789747148643410614
69889747148854510414

Duplicate rows

Most frequent

012345678910count
0320675335231071142
146690611112111122
270409711111121122
3110052461010281073342
4111611691010110833142
5119864131112131122
6121886011111131122
7132194251112131122